A Data-Driven Dependency Parser for Bulgarian
نویسندگان
چکیده
One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representations and employs a deterministic algorithm to construct dependency structures in a single pass over the input string, guided by a memory-based classifier at each nondeterministic choice point, as described in Nivre et al. (2004). Since the original BulTreeBank annotation is based on HPSG, it has been necessary to extract dependency structures from the original annotation both for training and evaluating the parser. The paper is structured as follows. Section 2 introduces the MaltParser system used in the experiments to induce parsers from treebank data. Section 3 presents BulTreeBank, and section 4 shows how the BulTreeBank annotation can be converted into a dependency structure annotation of the kind required by MaltParser. Section 5 describes the experimental conditions, and section 6 discusses the results of the experiments. Section 7 contains our conclusions.
منابع مشابه
Towards Minimal Recursion Semantics over Bulgarian Dependency Parsing
The paper discusses the transferring rules of the output from a dependency parser for Bulgarian into RMRS analyses. This task is required by the machine translation compatibility between Bulgarian and English resources. Since the Bulgarian HPSG grammar is still being developed, a repairing mechanism has been envisaged by parsing the Bulgarian data with the Malt Dependency Parser, and then retri...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملMaltParser: A Data-Driven Parser-Generator for Dependency Parsing
We introduce MaltParser, a data-driven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows user-defined feature models, consisting of arbitrary combinations of lexical features, part-of-speech features and depe...
متن کاملImproving data-driven dependency parsing using large-scale LFG grammars
This paper presents experiments which combine a grammar-driven and a datadriven parser. We show how the conversion of LFG output to dependency representation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and German and show significant improvements stemming from the propose...
متن کاملPartial Parsing from Bitext Projections
Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which ...
متن کامل